Search CORE

29 research outputs found

Cross-Modal Learning for Sketch Visual Understanding.

Author: Song. Jifei.
Publication venue: Queen Mary University of London.
Publication date: 12/02/2021
Field of study

PhD Theses.As touching devices have rapidly proliferated, sketch has gained much popularity as an alternative input to text descriptions and speeches. This is due to the fact that sketch has the advantage of being informative and convenient, which have stimulated sketchrelated research in areas such as sketch recognition, sketch segmentation, sketch-based image retrieval, and photo-to-sketch synthesis. Though these eld has been well touched, existing sketch works still su er from aligning the sketch and photo domains, resulting in unsatisfactory quality for both ne-grained retrieval and synthesis between sketch and photo modalities. To address these problems, in this thesis, we proposed a series novel works on free-hand sketch related tasks and throw out helpful insights to help future research. Sketch conveys ne-grained information, making ne-grained sketch-based image retrieval one of the most important topics for sketch research. The basic solution for this task is learning to exploit the informativeness of sketches and link it to other modalities. Apart from the informativeness of sketches, semantic information is also important to understanding sketch modality and link it with other related modalities. In this thesis, we indicate that semantic information can e ectively ll the domain gap between sketch and photo modalities as a bridge. Based on this observation, we proposed an attributeaware deep framework to exploit attribute information to aid ne-grained SBIR. Text descriptions are considered as another semantic alternative to attributes, and at the same time, with the advantage of more exible and natural, which are exploited in our proposed deep multi-task framework. The experimental study has shown that the semantic attribute information can improve the ne-grained SBIR performance in a large margin. Sketch also has its unique feature like containing temporal information. In sketch synthesis task, the understandings from both semantic meanings behind sketches and sketching i process are required. The semantic meaning of sketches has been well explored in the sketch recognition, and sketch retrieval challenges. However, the sketching process has somehow been ignored, even though the sketching process is also very important for us to understand the sketch modality, especially considering the unique temporal characteristics of sketches. in this thesis, we proposed the rst deep photo-to-sketch synthesis framework, which has provided good performance on sketch synthesis task, as shown in the experiment section. Generalisability is an important criterion to judge whether the existing methods are able to be applied to the real world scenario, especially considering the di culties and costly expense of collecting sketches and pairwise annotation. We thus proposed a generalised ne-grained SBIR framework. In detail, we follow the meta-learning strategy, and train a hyper-network to generate instance-level classi cation weights for the latter matching network. The e ectiveness of the proposed method has been validated by the extensive experimental results

Queen Mary Research Online

Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma

Author: Hospedales Timothy
Song Jifei
Song Yi-Zhe
Xiang Tao
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2017
Field of study

Crossref

Edinburgh Research Explorer

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Author: Hospedales Timothy
Song Jifei
Song Yi-Zhe
Xiang Tao
Yu Qian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/12/2017
Field of study

Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details: (2) It combines coarse and fine semantic information via a shortcut connection fusion block: and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art

University of Surrey

Edinburgh Research Explorer

Surrey Research Insight

Generalizable Person Re-identification by Domain-Invariant Mapping Network

Author: Hospedales Timothy
Song Jifei
Song Yi-Zhe
Xiang Tao
Yang Yongxin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

We aim to learn a domain generalizable person reidentification (ReID) model. When such a model is trained on a set of source domains (ReID datasets collected from different camera networks), it can be directly applied to any new unseen dataset for effective ReID without any model updating. Despite its practical value in real-world deployments, generalizable ReID has seldom been studied. In this work, a novel deep ReID model termed Domain-Invariant Mapping Network(DIMN) is proposed. DIMN is designed to learn a mapping between a person image and its identity classifier, i.e., it produces a classifier using a single shot. To make the model domain-invariant, we follow a meta-learning pipeline and sample a subset of source domain training tasks during each training episode. However, the model is significantly different from conventional meta-learning methods in that: (1) no model updating is required for the target domain, (2) different training tasks share a memory bank for maintaining both scalability and discrimination ability, and (3) it can be used to match an arbitrary number of identities in a target domain. Extensive experiments on a newly proposed large-scale ReID domain generalization benchmark show that our DIMN significantly outperforms alternative domain generalization or meta-learning methods

Crossref

University of Surrey

Edinburgh Research Explorer

Surrey Research Insight

Deep Multi-task Attribute-driven Ranking for Fine-grained Sketch-based Image Retrieval

Author: Hospedales Timothy
Ruan Xiang
Song Jifei
Song Yi-Zhe
Xiang Tao
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/09/2016
Field of study

Fine-grained sketch-based image retrieval (SBIR) aims to go beyond conventional SBIR to perform instance-level cross-domain retrieval: finding the specific photo that matches an input sketch. Existing methods focus on designing/learning good features for cross-domain matching and/or learning cross-domain matching functions. However, they neglect the semantic aspect of retrieval, i.e., what meaningful object properties does a user try encode in her/his sketch? We propose a fine-grained SBIR model that exploits semantic attributes and deep feature learning in a complementary way. Specifically, we perform multi-task deep learning with three objectives, including: retrieval by fine-grained ranking on a learned representation, attribute prediction, and attribute-level ranking. Simultaneously predicting semantic attributes and using such predictions in the ranking procedure help retrieval results to be more semantically relevant. Importantly, the introduction of semantic attribute learning in the model allows for the elimination of the otherwise prohibitive cost of human annotations required for training a fine-grained deep ranking model. Experimental results demonstrate that our method outperforms the state-of-the-art on challenging fine-grained SBIR benchmarks while requiring less annotation

University of Surrey

Edinburgh Research Explorer

Surrey Research Insight

Deep Factorised Inverse-Sketching

Author: Hospedales Timothy
Li Da
Pang Kaiyue
Song Jifei
Song Yi-Zhe
Xiang Tao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/08/2018
Field of study

Modelling human free-hand sketches has become topical recently, driven by practical applications such as fine-grained sketch based image retrieval (FG-SBIR). Sketches are clearly related to photo edge-maps, but a human free-hand sketch of a photo is not simply a clean rendering of that photo's edge map. Instead there is a fundamental process of abstraction and iconic rendering, where overall geometry is warped and salient details are selectively included. In this paper we study this sketching process and attempt to invert it. We model this inversion by translating iconic free-hand sketches to contours that resemble more geometrically realistic projections of object boundaries, and separately factorise out the salient added details. This factorised re-representation makes it easier to match a free-hand sketch to a photo instance of an object. Specifically, we propose a novel unsupervised image style transfer model based on enforcing a cyclic embedding consistency constraint. A deep FG-SBIR model is then formulated to accommodate complementary discriminative detail from each factorised sketch for better matching with the corresponding photo. Our method is evaluated both qualitatively and quantitatively to demonstrate its superiority over a number of state-of-the-art alternatives for style transfer and FG-SBIR.Comment: Accepted to ECCV 201

arXiv.org e-Print Archive

Crossref

University of Surrey

Edinburgh Research Explorer

Universal Sketch Perceptual Grouping

Author: Hospedales Timothy
Li Ke
Pang Kaiyue
Song Jifei
Song Yi-Zhe
Xiang Tao
Zhang Honggang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2018
Field of study

In this work we aim to develop a universal sketch grouper. That is, a grouper that can be applied to sketches of any category in any domain to group constituent strokes/segments into semantically meaningful object parts. The first obstacle to this goal is the lack of large-scale datasets with grouping annotation. To overcome this, we contribute the largest sketch perceptual grouping (SPG) dataset to date, consisting of 20, 000 unique sketches evenly distributed over 25 object categories. Furthermore, we propose a novel deep universal perceptual grouping model. The model is learned with both generative and discriminative losses. The generative losses improve the generalisation ability of the model to unseen object categories and datasets. The discriminative losses include a local grouping loss and a novel global grouping loss to enforce global grouping consistency. We show that the proposed model significantly outperforms the state-of-the-art groupers. Further, we show that our grouper is useful for a number of sketch analysis tasks including sketch synthesis and fine-grained sketch-based image retrieval (FG-SBIR). Â© Springer Nature Switzerland AG 2018

University of Surrey

Edinburgh Research Explorer

On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

Author: Armagan Anil
Brasch Nikolas
Busam Benjamin
Ilic Slobodan
Jung HyunJun
Leonardis Ales
Li Yitong
Navab Nassir
Ruhkamp Patrick
Song Jifei
Verdie Yannick
Zhai Guangyao
Zhou Yiren
Publication venue
Publication date: 26/03/2023
Field of study

Learning-based methods to solve dense 3D vision problems typically train on 3D sensor data. The respectively used principle of measuring distances provides advantages and drawbacks. These are typically not compared nor discussed in the literature due to a lack of multi-modal datasets. Texture-less regions are problematic for structure from motion and stereo, reflective material poses issues for active sensing, and distances for translucent objects are intricate to measure with existing hardware. Training on inaccurate or corrupt data induces model bias and hampers generalisation capabilities. These effects remain unnoticed if the sensor measurement is considered as ground truth during the evaluation. This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction. We rigorously show the significant impact of sensor characteristics on the learned predictions and notice generalisation issues arising from various technologies in everyday household environments. For evaluation, we introduce a carefully designed dataset\footnote{dataset available at https://github.com/Junggy/HAMMER-dataset} comprising measurements from commodity sensors, namely D-ToF, I-ToF, passive/active stereo, and monocular RGB+P. Our study quantifies the considerable sensor noise impact and paves the way to improved dense vision estimates and targeted data fusion.Comment: Accepted at CVPR 2023, Main Paper + Supp. Mat. arXiv admin note: substantial text overlap with arXiv:2205.0456

arXiv.org e-Print Archive

University of Birmingham Research Portal